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Abstract 

We study the critical slowing down towards the continuum limit of lattice QCD simula- 
tions with Hybrid Monte Carlo type algorithms. In particular for the squared topological 
charge we find it to be very severe with an effective dynamical critical exponent of about 
5 in pure gauge theory. We also consider Wilson loops which we can demonstrate to de- 
couple from the modes which slow down the topological charge. Quenched observables 
are studied and a comparison to simulations of full QCD is made. In order to deal with 
the slow modes in the simulation, we propose a method to incorporate the information 
from slow observables into the error analysis of physical observables and arrive at safer 
error estimates. 
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1 Introduction 



In all Monte-Carlo methods, the control of statistical and systematic errors is the main 
requirement for reliable calculations. However, this is frequently made difficult by the 
phenomenon of critical slowing down, an increase in computational effort while ap- 
proaching critical points of a theory, beyond the naive scaling with the number of 
points of the system, due to an increase of auto-correlation times. At first sight, this 
might not seem a particularly appealing object of study. The auto-correlation times are 
not universal quantities, they depend on the particular discretization of the theory, the 
algorithms used and the correlation lengths. However, in order to control the statistical 
uncertainties and make certain that the simulation is sufficiently ergodic, it is pivotal 
to ensure that all auto-correlations are much shorter than the total run. The danger 
one faces in real-world simulations is that there are auto-correlations, which are much 
longer than the total statistics and therefore cannot be detected from the simulation 
itself. 

Our object of study is lattice QCD, for which recent years have witnessed significant 
progress in the algorithms. In particular, simulating light quarks on large volumes has 
become feasible on current computers and control over the chiral extrapolation has 
improved accordingly [lj[2j. Consequently, better control over the cut-off effects is the 
next target. Approaching the continuum limit means approaching a continuous phase 
transition and therefore critical slowing down is to be expected. The question is how 
severe it is and whether fine enough lattices can be reached. How fine a lattice is needed 
for sufficient control of the scaling violations depends again on the quantity to study, the 
discretization and also on the required accuracy. However, in particular if the physics 
of charm quarks is to be studied, at least lattices with a lattice spacing down to 0.04fm 
are required for precision physics. 

The severity of the critical slowing down depends on the algorithm and on the ob- 
servable in question. An observable with notoriously long auto-correlations for virtually 
all algorithms used for either pure Yang-Mills theory or QCD is the global topological 
charge. It has been studied over the years using link-update algorithms for pure gauge 
theory [3] and also in QCD with molecular dynamics based algorithms [4j|7]. However, 
let us stress that it is not the topological charge itself which is slow. Slowly moving 
modes of the transition matrix of the Markov process are just particularly prominent 
in this observable and therefore lead to the long auto-correlations. The same modes 
also couple to other observables and also their auto-correlation times are affected. The 
amount of coupling of the modes to the different observables is not known a-priori. 

This article serves two purposes. First we study the critical slowing down of various 
quantities, i.e. the topological charge, Wilson loops and hadronic correlation functions 
as the lattice spacing is varied over the range used in contemporary simulations. We will 
observe that among those only the charge is affected by very severe slowing down. If one 
assumes that the picture does not change drastically while going from the quenched the- 
ory to fully dynamical simulations, one can use the scaling laws of the auto-correlation 
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times from this study, to set minimal requirements for the total simulation time in full 
QCD. 

The second purpose of this paper is the question of how to deal with the presence 
of the slow modes in the data analysis. In particular we will propose a procedure to 
give conservative estimates of the statistical errors also in the situation where the slow 
mode contribution cannot be detected directly. 

In Sec. [2] we will therefore give the basics of the error analysis of Markov Chain 
Monte Carlo data. This will lay the ground for the improved error estimates in the 
presence of very slow modes of the Monte Carlo evolution. Preparing for the numerical 
results (Sec. |4l) we list the algorithms and observables that we study in Sec. [3} 



2 Error estimation 



We consider a Markov chain generated by a transition matrix 

M(q' <- q) (2.1) 

giving the probability for the change from a state q to a state q'. For simplicity we assume 
a discrete set of states q. The desired ensemble distribution, P(q), is an eigenvector of 
the transition matrix, Yl q ^— q)P(q) = P(l')- Ensemble averages of observables 

O a (q) are 

(O a ) = Y,O a (q)P(q). (2.2) 

q 

In the numerical application, after a suitable thermalization, we take a finite number of 
Monte Carlo steps N, yielding states q\, . . . , q^ and estimate 

1 N 

(O a ) = O a ±50 a , O a = ~Y J O a {q i ). (2.3) 

i=l 

The uncertainties 50 a = 0(l/y/N) and more generally those of functions F((0)) are 
given in terms of the auto-correlation function 



K 

T a p{t) = lim - [0«(ft+t) " (Oa)] [Op(q t ) - {Op)} (2.4) 



1=1 



and have to be estimated from the generated finite sequence qi,...,qw itself. This is 
done by evaluating the expression in eq. (2.4) for a finite but large K. For the estimate 
of the error of T see App. |A| 
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The formulae 



(£F) 2 = ^2r int (F), 4 = I>(0), (2.5) 

n a t(F) = l + jtpF(t), pF{t) = Yj§)> (2 - 6) 

T F (t) = Y J F*r a p(t)Fp, (2.7) 

are derived by a Taylor expansion of F in terms of (O a ) (8- 10 . For complicated func- 
tions F, the occurring derivatives = g ?Q > can be evaluated numerically 10 . 

The integrated auto-correlation time, ^^(i 7 ), characterizes the dynamics of the 
Monte Carlo process relevant for the observable F. It is difficult to determine, since the 
errors of T(t) remain roughly constant as a function of t. Therefore the proposed esti- 
mate of Madras and Sokal [9] and its generalization for functions of primary observables 
by Wolff involve a window W, 

1 w-i 

r- mt (F,W) = -+ (2-8) 
The window is chosen to balance the systematic error due to truncation, 

oo 

R F (W) = J2^F(W + t) (2.9) 



i=0 



with the statistical error. In particular |10| advocates the value of W which minimizes 
an estimatdT] 

E(W) = e~ w/Tw + 2y/W/N where t w rj S r mt (F, W) (2.11) 

for the sum of systematic and statistical relative error of Tint- S is a parameter, which 
by default is set to 1.5, and has to be adjusted by hand if other time scales, much larger 
than Ti n t are relevant. In other words, a proper choice of S requires an inspection of the 
particular shape of the auto-correlation function. 

We note that this criterion estimates the time scale for contributions to T m t(F) from 
t > W by Ti nt (F, W) itself. However, when the lattice spacing becomes small, the time 
scale which is relevant for the tails of auto-correlation functions can become significantly 
different from T m t(F) in lattice gauge theory simulations. We will see examples of this in 
Sect.jiJ Indeed, it can be shown that |IV(t)| < const. e _ */ Tcxp for any Markov chain 
An elegant proof is given in the cited reference. 
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The exact formula applied in 10 is 

l/(2n nt (F,W)) 



3 



It is usually assumed that the above bound is realized at large t vs. 

r F (t) t "£ x, A F e-*/ T "* (2.12) 

up to terms with a faster exponential decay. Indeed for algorithms which satisfy the 
detailed balance condition, 

M(q' <- q) P(q) = M(q <- q') P(q') , (2.13) 



amongst them most versions of the Hybrid Monte Carlo (HMC) algorithm |12| , eq. (2.12 ) 
can be proven. We turn to a brief discussion of auto-correlation functions in this more 
restricted class. 

2.1 Algorithms with detailed balance 



When eq. (2.13) is satisfied, it is convenient to introduce the symmetric matrix 

T(q, q') = [Ptf))- 1 ' 2 M(q' <- q) [P(q)] 1 / 2 , (2.14) 

which has real eigenvalues A n , n > 0, with Aq = 1 and |A n | < 1 for n > 1, assuming an 
ergodic algorithm. We order the eigenvalues as A n < A n _i. There is a complete set of 
eigenfunctions Xn(o) with xo(q) = [Pio)] 1 ^ 2 ■ Starting from the representation 

F a p(t) = [Op(q') - {Op)] M\q' <- q) \O a {q) - (O a )] P(q) (2.15) 
with M n+ V <-q) = J2 q " M W <~ q")M n (q" <- q), we then have 

T F (t) = Y, F « F ? TP**® ~ (O a )] [P(q)} 1/2 T\q,q') [P(q')] 1 / 2 [Op(q>) - (Op)] 

a,P q,q' 

= E( A «)*^(^)] 2 (2-16) 

n>l 

in terms of the "matrix elements" 

Vn (F) =^F a ^Xn(9)[P((?)] 1/2 [O a (q] - (O a )] . (2.17) 



We recognize eq. (2.12) with Ap = [r]i(F)] and r exp = — l/log(Ai) provided Ai > 0. In 
general all eigenmodes of the matrix T contribute to the above sum over n. 

However, exact symmetries may entail selection rules with rf n (F) vanishing for some 
n. As an example let us consider a parity symmetry q —> q' = S(q) with P(S(q)) = P(q) 
and S(S(q)) = q. It is a symmetry of the algorithm if 

T(S(q'),S(q))=T(q>,q). (2.18) 

With respect to the action of 5, the eigenfunctions Xn(o) of T can then be divided into 
even ones, Xn + (S(q)) = Xn + (l) and odd ones, Xn_ (>>(<?)) = — Xn_(<?)- Observables are 
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then also split into even (s = 1) and odd (s = —1), F s (0(S(q))) = sF s (0(q)) and have 
an auto-correlation function 



I>.(*)= £(*»•)* Wi)] 2 (2-19) 

n s >l 

with only even or odd contributions. Since the ensemble average of odd observables 
vanishes, one can restrict the attention to s = +1. 

Most versions of the HMC algorithm for QCD are invariant under ordinary parity, 
which means that it suffices to look at parity invariant observables to search for the 
relevant slowest mode. For our QCD studies we therefore consider Q 2 instead of the 
parity odd topological charge Q. 

We are now in the position to discuss improved error estimates, namely estimates 
which aim at giving a realistic and/or conservative estimate of the tail contribution 
eq. (2.9) to the error of F also in the situation when T exp is significantly larger than 



r m t(F). 



2.2 Improved error estimates 



Remaining with algorithms which satisfy detailed balance, we can start from eq. (2.16). 
For n > 1 we then have |A n | < 1 and Ylt^=o C^n)* = V(l ~~ ^n) > an d furthermore 
1/(1 - A n ) < 1/(1 - Ai). This yields bounds 



Rf{W) < Y^(\n) W [r]n(F)] 2 = -T F (W) = r exp r F (VF) (1 + 0(l/r cxp )) , 

1-Aa^ 1-Ai 

(2.20) 

Rf(W) > ^^(A 1 ) w [r ?1 (F)] 2 = r exp e- H/ / Tc -(l + 0(l/r cxp )) (2.21) 
1 — Ai 

for even W. They translate into bounds on T[ n t(F). 

As long as the configuration space is large, we expect these bounds to hold quite 
generically, also for algorithms which do not satisfy detailed balance. Certainly Monte 
Carlo (MC) experiments that we have seen so far are in agreement with such a behaviour. 

Let us now assume that we are in a situation where the following is true 

1. There is some knowledge about r exp from previous MC runs or an extrapolation 
from other parameters of the simulated theory. 

2. The considered MC run is still long compared to T ex p itself, 

N > r exp , (2.22) 

but not so long that one can just sum up the auto-correlation function with a 
window W ~ T oxp . 

3. We are interested in an error estimate which safely includes the contribution rep- 
resented by the slow mode corresponding to r exp or slow modes n with A„ w Ai. 
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We propose to choose a window W\, according to the criterion of eq. ( 2.11 ) and explained 



in [10], with the parameter S set to its default value of 1.5 and the associated 

rUF) = r int (F,W l ) (2.23) 

as well as a second window W u where the auto-correlation function is still significant 
by, e.g., three standard deviations and add an estimate of the tail giving 

tSa(F) = r int (F, W u ) + T expPF (W u ) . (2.24) 

In cases where p F falls very quickly and is compatible with zero at short time t = Wq, 
e.g. Wo = 5, we replace this estimate by 

r£ t (F) = r int (F, Wo) + 2t cxp 5[ Pf (W )} for S[p F (W )] > p F (W ) , (2.25) 

where 5[p] is the estimated error of p. When one is interested in Ti n t(F) itself, e.g. 
for the investigation of algorithms, one should choose an interval covering T^ nt (F) and 
T^ t (F) together with their statistical errors. If on the other hand one just wants a safe 
estimate of the error of the observable we propose to choose T^ t (F). 

An additional issue is that in the presumed situation, it is also of interest to estimate 
how severely an observable F is affected by the slow mode(s). The ratio Tj^ t (F)/r exp is 
a possible measure, but to quantify this more precisely, it is better to try to isolate the 
contribution of the slowest mode. The corresponding normalized amplitude is 

One may object immediately that it is very difficult, if not impossible, to estimate r exp , 
which is needed in the above formulae. In Fig. [I] we therefore just show one numerical 
result already at this point: the "effective mass plot" from auto-correlation functions 
of a few observables. Details of the numerical simulation are described only later in 



Sect. 4.1 however, it is clear from the figure that considering several observables can help 
for getting a handle on the slow modes. Of course the statistics has to be large enough, 
but as an empirical observation, an early onset of the plateau in log(p(t)/p(t + r)) is 
beneficial when r exp is large. Furthermore, the whole proposal relies on the fact the 
slowest mode, and with it r exp , can actually be identified. Absolute certainty on this 
is virtually impossible to achieve, however, by looking at a large number of operators, 
at least a significant portion of the relevant space can be covered. Also in case there 
is an even slower mode than the one identified, the proposed method does provide a 
more conservative estimate of the contributions up to this threshold, and can therefore 
improve the analysis. 

2.3 Decoupling and dynamical correlation coefficient 



Since r exp enters in the exponent in eq. (2.26), this representation is useful if r, 



1 exp 



is already known rather precisely - a rare luxury. A more practical representation 
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Figure l: Effective mass plot }- log(pF(t)/pp(t + r)) for two observables F in run C3d. 
Here r = 6.75 is the spacing between consecutive measurements. 



replaces r ex p by an effective one. To this end, we take observables Op which couple 
relatively strongly to the slow MC mode. For QCD possible choices are the square of 
the topological charge Op = Q 2 a or the smeared plaquette Op = P a with a labelling 
different smearing levels, see Sect. [U for details. We can use 



CW = p WumY > ( 2 - 27 ) 

2 log A/r ov „ ggwgj \ 



Max 



P P P {t) / 



but clearly other choices are possible. The effective coefficient 

Cf(t) = PF (t)e t/T ^ {t) . (2.28) 

then suggests itself. When detailed balance is guaranteed, a further effective estimator 
is 

where Tpcit) = Y^ a a F a T a p(t)Gp and we have assumed that G is an observable with a 
strong coupling to the slow mode. In other words Cg is large. This representation will 
be valid (at large t) if Ai is an isolated eigenvalue and in practice if indeed the critical 
slowing down is dominated by the single mode n = 1. It simply follows from the mode 
decomposition T FG (t) = S n >i(A n )* T} n (F)rj n (G). 

2 We remind the reader that in QCD with parity conserved, the whole discussion is to be restricted 
to parity even observables. 
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Clearly eq. (2.28) is more generic and even expected to be useful when detailed 



balance is not satisfied, but the advantage of eq. (2.29) is that it can possibly be used 
at much larger t, showing smaller statistical errors in that region. 

We can now define what we mean by decoupling of an observable from the slow 
mode n = 1: in practice it means Cp -C 1 while in terms of critical slowing down, it 
should be defined as a significant decrease of Cp as the correlation length and r exp grow, 
e.g. Cp ~ (correlation length) -7 with some positive 7. In MC runs this decoupling is 
expected to be visible in the behaviour of Cp(t) at moderate time t. Given the inherent 
problems in seeing asymptotic behaviour in numerical simulations, it is useful to go 
further and define a time scale r* through 

<U rr *) = T * ( 2 - 3 °) 

and 

C F (r) = Cf(m) . (2.31) 

In the same way, CJr may be replaced by Cf?. Using r* is similar in spirit to the original 
Sokal proposal for fixing the window of summation for the Tj n t by the point at which the 
summation window W exceeds a multiple of Ti nt (iy). A choice of r significantly smaller 
than one is necessary when the overall statistics is moderate. We emphasize again our 



condition eq. (2.22), however. The advantage of eq. (2.31) is that we do not have to 
consider asymptotically large t with their associated systematics. Decoupling can be 
studied at a fixed (not unreasonably small) value of r. If C* F {r) shows decoupling it will 
usually also be the case in Cp. 

2.3.1 Relation to static correlations 

In the language used here, the square of the standard correlation coefficient of observ- 
ables F and G id3 

^static _ t r FG(0)] 2 , n 

Cfg -r>(o)r G (or (2 ' 32) 

It is a static property, independent of the algorithm as only t = appears. We now 
notice that if G "is" approximately the slow mode, which precisely means 

\Vi(G)\ > \Vn(G)\ Vn> 1, (2.33) 

then we have Tpcit) ~ rji(F)r]i(G) and Tc(t) ~ [?7i(G)] 2 and therefore also 

Cp ps C s ^ tic . (2.34) 

We may therefore consider Cp to be the dynamical correlation coefficient between F and 
G. Generically it will be very different from the static one, emphasizing that the static 

3 Normally one will just consider primary observables, but the correlation coefficient of arbitrary 
functions F, G is a straightforward generalization. 
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correlation of the observable F to a slow one (e.g. the topological charge in QCD) is not 
the proper way to discuss the error of F. Rather the dynamical correlation coefficient 
Cf has to be used. 



3 Algorithms and observables under study 

For the numerical investigation of the HMC and DD-HMC, we now first give the basic 
definition of the algorithms and then of the observables which we choose to investigate. 

3.1 Algorithms 

In hybrid Monte Carlo [12] and related algorithms, the gauge fields are updated by 
solving the classical equations of motion associated with the Hamiltonian 

H = ^(U,U) + S(U) , (3.1) 



where the antihermitian H^ are the momenta conjugate to the gauge fields U x ,^- Their 

trTT 2 



scalar product is defined as (II, IT) = ^^u^L' With the Monte Carlo time r, 



the equations of motion then read 



d d 



Ha;,/x — F and ^x,n — ^-x,fj,U Xt fi , (3-2) 



where the force F fulfills {oj,F) = 5 W S(U) for infinitesimal variations of the gauge field 
SljUx^ = oj x JJ X a- In these definitions, we follow the ones used in the context of the 



DD-HMC [13] . We give them, because they fix the normalization of the trajectory 
length r, which is not unique in the literature. The conventions of Ref. [l4] used, e.g., 
in the MILC code result in a different normalization: a trajectory of length y/2 in the 
conventions above corresponds to a unit length trajectory in those of Ref. [14| . 

The difference between HMC and DD-HMC is that the latter introduces a decom- 
position of the lattice into blocks of size Bq x B\ x B<i x B%. During each trajectory, only 
the links are updated, which have at least one endpoint in the interior. The fraction of 
these "active" links is given by 



UloiBi - 2) + lT,Lo UjM B ^ ~ 2 ) 



R = lli=uv ' 4^.=onj^v j , _ (3 3) 



Since the active links are treated in exactly the same way as in HMC, naively, auto- 
correlation times will be proportional to the inverse of R. Therefore, we scale them 
in the following by this ratio, noting that in pure gauge theory also the cost of the 
simulation scales accordingly. 

At the end of each trajectory, the HMC algorithm has a global Metropolis accep- 
tance step to correct for the errors in the numerical integration of the equations of 
motion. For the DD-HMC in pure gauge theory with the Wilson gauge action, however, 
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the molecular dynamics evolution of the active links on each block is independent of 
the other blocks. We can therefore perform the Metropolis step for each block individ- 
ually]^] Compared to the conventional global acceptance step, a given acceptance rate 
can be achieved with a significantly larger step size. All our runs are done at accep- 
tance rates above 90%, and in this case, the block-wise acceptance does not influence 
the auto-correlation times of the pure gauge observables within errors. 

In order to be ergodic, all links of the lattice have to become active within some 
(composite) series of update steps. This is achieved by translating the domain decom- 
position relative to the lattice between trajectories. The scheme is described in detail 
in Ref. |13| and alternates random shifts with directed ones, the latter to increase the 
efficiency of this step. Because of the directed shifts, however, the full algorithm does 



not obey detailed balance. Even if eq. (2.20) can then not be shown mathematically, we 



still expect it to be valid at not too small t. In any case eq. (2.24) represents a useful 



estimate of the integrated auto-correlation time. The same reasoning holds true for the 



factorization behind eq. (2.29). 



3.2 Observables 

We want to study the effect of the critical slowing down of the (DD)-HMC algorithm 
on observables of interest for physics. We consider meson two-point functions, Wilson 
loops and the topological charge, which we will now define. In order to be more sensitive 
to the slow modes, we also computed some observables on smoothed gauge fields. For 
this purpose we apply up to five levels of HYP smearing [15] to the link variables. 

The slow evolution is very prominent in the topological charge, for which we use 
the naive gauge definition 



Q« = I^2 fl4 E tr > (3-4) 



X,fl,U 

where the lattice field strength tensor F^\x) is constructed from the clover leaf repre- 
sentation (see e.g. |16| for a definition) but from a times HYP smeared links, where we 
consider a < 5. We find little difference beyond the first iteration of smearing. 

As physics oriented observables, we compute Wi(h,l2), the Wilson loops of size 
l\ x ?2 after one level of HYP smearing, and the ones without smearing W(h,l2)- Only 
the plaquette P a = W a (a, a) is also considered with higher levels of smearing, a < 5. 

In order to study the effects of the slow modes on hadronic observables, we take as 
an example the correlators used in the quenched study of the D s meson at parameters 
of Ref. [l7] . We compute 

C P p(t) = a 3 ^(P rs (t,x)P-(0,0)) 

JL (3-5) 

X 

4 We thank M. Liischer for this suggestion. 
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with the pseudo-scalar density P rs = fj^s and the time component of the axial-vector 
current A T S = f7o75S. These are estimated on each configuration using the one-end 
method [l8p9] with 5 stochastic U{1) sources per configuration. Interesting observables 
are the effective meson mass m e ff, which is defined through 



C PP (t + a) cosh((t + a - T/2)m eS (t)) 



(3.6) 



Cpp(t — a) cosh((t — a — T/2)m e ff (t)) 
and the PCAC quark mass (d t f(t) = \{f{t + a) - f(t)) , d* t f{t) = \{f(t) - f(t - a))) 

\(d t + d$)C AV (t) + a c A dtd t C PP (t) 



m 



2C PP (t) 

For both masses, as well as for the decay constant, 



fi*® = lr Zf®,^ -« W/2 , (3-7) 
[Cpp(t) m cit (t)] 1 /' ! 

we average over a suitably chosen plateau in t. 
4 Results 

We have performed a considerable number of long simulations allowing for a study of 
the dependence of auto-correlations on several parameters. Table [l] presents an overview 
of the pure gauge theory simulations; on CI and C4 also quenched measurements were 
carried out. Most ensembles are lattices generated with the Wilson gauge action of 
constant volume L 4 with L = 2.2 fm, where the physical scale comes from r^/a of 



|20| with a nominal value of tq = 0.5 fm |21|. We complement this in Sect. 4.5 by a 
comparison to dynamical Nf = 2 QCD runs. 

4.1 Pure gauge theory 

Let us start the discussion of the results with the pure gauge ensembles of the Wilson 
gauge action at constant physical volume, with main interest on the dynamical critical 
slowing down of the topological charge and how it is reflected in other observables. Since 
we are in pure gauge theory, the Wilson loops will serve as prime reference. 

4-1-1 Lattice spacing dependence 

The critical slowing down in the square of the topological charge is rather dramatic as 
demonstrated in Fig. [2j where we show the normalized auto-correlation function for our 
four lattice spacings, all with trajectory length r = 4. The Monte Carlo time t is given 
in molecular dynamics units (MDU) multiplied by R. This unit is applied throughout 
this paper. 

From our data we also determine the integrated auto-correlation times by using the 



criterion given in Eq. (2.11), where we used the default value 5 = 1.5 unless specified 
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Table 1: Parameters of our runs. We give the bare coupling, the size of the lattice, 
the lattice spacing from r$ = 0.5 fm, the block decompostion in the DD-HMC, the 
corresponding fraction of active links R, the trajectory length r and the step size of the 
integration At along with the acceptance rate A and the total statistics in molecular 
dynamics units. Runs with blockwise acceptance step are marked with an asterisk on 
the step size. 
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Figure 2: Normalized auto-correlation function of Q\ at various lattice spacings. The 
Monte Carlo time is given in molecular dynamics units multiplied by R. 

differently. Results for the plaquette, charge and the square of the charge as well as 
auto-correlation times are shown in Table [2] Note that the average of the charge is 
compatible with zero in our long runs, which is an indication that the determination 
of uncertainties and auto-correlations is under control. We also observe a considerable 
difference between Ti nt (Q) and T; n t(Q 2 ), in line with the arguments given at the end of 
section I2.ll 

The main result of this section is shown in Fig.[4j where we give the auto-correlation 
times of Q\ and of Wi (0.5 fm, 0.5 fm) as a function of the lattice spacing. This Wilson 
loop is chosen since it is roughly at this size that we find the longest auto-correlation 
times, see Fig. |3j Creutz ratios behave very similarly. The observed maximum of t 1iA is 
surprising at first sight, but large Wilson loops are dominated by strong ultraviolet (UV) 
fluctuations and therefore have a large variance T(0) compared to their expectation 



value. In Sect. 4.3 we will consider other long distance observables with a smaller 
variance. 

We compare two ansatze to describe the behaviour of the auto-correlation times, 

Tmt(^) = h {a/r ) z and r mt (F) = k 2 exp(a/a) (4.1) 

where the first is the standard behavior in the vicinity of a continuous phase transition, 
whereas the exponential form was advocated in the context of the CP^ N ~^ model in 
Ref. [22]; we use it only for the topological charge. Even our high statistics data is not 
precise enough to accurately determine an effective critical exponent. However, with 
the power law, we get z ~ 5 for Q^, a very severe critical slowing down. The data 
is also not good enough to distinguish it from the exponential form, for which we find 
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Figure 3: Auto-correlation time of square Wilson loops as a function of their size, for 
simulation C2d. Smeared loops Wi(r, r) are marked as x, while + symobols show W(r, r). 
The line at the bottom shows tN ™ r } where N m is the number of trajectories of length 
r between two consecutives measurements. 




Figure 4: Auto-correlation time of Q\ and the (0.5 fm) x (0.5 fm) square Wilson loop as 
a function of the lattice spacing using the DD-HMC algorithm. For Q\ the two curves 
are fits through the last three points of the two ansatze of Eq. (4.1). For the Wilson 
loop only the fit to the power law has been performed, through all points. 
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Table 2: The average plaquette, the topological charge and its square along with auto- 
correlation times (computed with S = 3) of the smeared plaquette and the (squared) 
charge for our ensembles described in Table [TJ The last column gives the exponential 
auto-correlation time as defined in Eq. ( 2.30[ ). 
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a ~ 0.4fm. The Wilson loop, on the other hand, follows a power law with z ~ 0.6 within 
our range of data, which is a surprisingly mild behavior. This already demonstrates the 
decoupling discussed in Sect. [2} The Wilson loops decouple from the slow modes which 
couple strongly to the square of the charge. We will come back to this subject below. 
The exponent for the Wilson loops is compatible with the z = 1 for HMC in free field 
theory [23]. 
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x 

Figure 5: Auto-correlation time of Q\ in units of molecular dynamics time scaled by R 
as a function of the trajectory length for the 24 4 lattices at /3 = 6.0. We show the data 
for two block decompositions in the DD-HMC as well as data for HMC simulations. 
The curve is a fit through all points to the functional form c\/y/r + t/2. 

4-1-2 Dependence on trajectory length and block size 

This brings us to the discussion of the various parameters, on which this picture might 
depend: the trajectory length, the block decomposition and the physical volume. The 
dependence of T mt (Ql) on the trajectory length is visualized in Fig. [5]for the a « O.lfm 
lattices. It demonstrates that longer trajectories can lead to shorter auto-correlation 
times in units of molecular dynamics time, which takes into account the additional effort 
needed for the longer trajectories. That longer trajectories can improve the performance 
of the algorithm has been part of the original motivation for the Hybrid Molecular 
Dynamics [14] , and has since been demonstrated, e.g., in Ref. 124]. In free field theory 
it is known that the optimal trajectory length depends on the observable and typically 
increases when the correlation length increases |23| . As long as the system is in a regime 
with Ti nt S> r, one can argue that the momentum refreshment at the beginning of each 
trajectory initiates a random change of direction in the otherwise directed walk. One 




• HMC 

+ DD-HMC 12 x 6 3 
x DD-HMC 12 2 x6 2 
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then expects longer trajectories to decrease r; n t proportional to 1/y/r, but at most down 
to the smallest possible value of T; nt = r/2, which means ri n t = 1/2 in units of complete 
updates. This simple model describes the gross features of our data reasonably well. 
Also on the finer C lattices, given in Table [2j the corresponding improvement can be 
observed. 

The data of the figure are also collected in Table[2]together with those from different 
block decompositions and also from the HMC algorithm. We observe that the blocks 
do not have a measureable impact on the auto-correlation times beyond the simple 
rescaling with the active link ratio R. Of course, the blocks have to have a reasonable 
minimal size. Our smallest blocks are still at least 0.5fm across, which is around the 
pure gauge theory correlation length defined from the string tension. 

4-1-3 Dependence on volume and discretization 

Most of our ensembles have a constant physical volume with L = 2.2fm, for which finite 
size effects of typical equilibrium expectation values are known to be small in the pure 
gauge theory. In order to check for a potential L-dependence of auto-correlations, we 
also generated an L = 3.3 fm ensemble at /3 = 6.18. Figure [6] demonstrates that no 
significant volume dependence is present - neither for the smeared plaquette P\ nor for 
the squared charge Q\ . 




200 



Figure 6: Auto-correlation time of Q\ and the smeared plaquette P\ at /3 = 6.18 on a 
32 4 and a 48 4 lattice. 

The emerging picture might also depend on the particular discretization used. So 
far, all results were for the Wilson gauge action. Therefore, we also generated an 
ensemble with the Iwasaki action with a = 0.09fm, with the same volume and simulation 
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parameters, using the HMC algorithm in both cases. We observe a drastically larger 
auto-correlation time for the topological charge, 

Tint(Qi) = 34(4) for Wilson gauge action, (4.2) 

T mt(Qi) = 220(50) for Iwasaki gauge action (4.3) 

However, this is not replicated in other observables, both the plaquette and the smeared 
plaquette having roughly the same auto-correlation times for the two actions. 



4.2 The charge in subvolumes 

Ultimately one needs to find an algorithm with smaller auto-correlations. For this 
purpose it is important to understand more about how the HMC moves the gauge 
fields through configuration space. Of course this is a difficult problem, as we need to 
reformulate it in terms of specific (gauge invariant) observables. 

An interesting such question is whether topological charge is being moved from 
some space-time volume to another one more quickly than the total charge is changing. 



This can be looked at by restricting the sum in eq. (3.4) to a region 1Z, computing the 
charge inside that region 

Its MC history will show whether charge has flown in or out of the region. More 
quantitatively we can directly look at the auto-correlation function of Q^: as shown 
in Fig. [7j The subvolume charge does decorrelate significantly faster than the total 
charge, but there is still a quite significant coupling to the slow mode remaining. The 
decoupling coefficient C* is around 0.7 for the 16 x 32 sublattice and about C* = 0.15 
for the 16 4 subvolume. The latter is a significant suppression. 



4.3 Quenched approximation 

Considering phenomenological applications and access to different QCD observables, 
hadron correlation functions are more interesting observables than Wilson loops. In 
order to have very good statistics and observables which do not suffer from an intrinsi- 
cally large variance, we study pseudo-scalar correlation functions. For cost reasons this 
is done just on 64 x 32 3 lattices. As an example we perform a study similar to the one in 
Ref . |17| , where the mass and decay constant of the D s as well as the charm quark mass 
were investigated in the quenched approximation. Neglecting sea quark effects allows 
us to generate an ensemble with the high statistics necessary for detecting even small 
influences of the slow modes. However, it comes at the price that small quark masses 
are not possible without running into the problem of exceptional configurations [25] . 
Even at the mass of the strange quark which we take over from Ref. [17] , we observed at 
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Figure 7: Auto-correlation function of Q^, with 1Z being the full lattice, half the lattice 
(cut in one dimension) and a 16th of the lattice, cut in half in all dimensions. We used 
a sequence of 320000 MDU of the C2d run. 




Figure 8: Auto-correlation function of the mass of the cc' pseudo-scalar meson with 
m c = m c > = m c harm- The meson mass is obtained from a plateau average. The two 
dashed lines show the upper/lower bound region of the tail contribution to the normal- 
ized auto-correlation function, given by C* F e~ l l T * . 

least one clearly exceptional strange quark propagator in 40000 measurements. We dis- 
card suspicious configurations using the criterion that the fourth moment of the strange 



19 



pseudo-scalar correlator M4 with M n = aS^ r °2+a* n ^ pp W * s a * l eas t ten standard 
deviations away from the average value. 

We used the Wilson gauge action at = 6.2 on a 64 x 32 3 lattice, sec also Table [lj 
simulation C4d. The Wilson fermion action is non-perturbatively 0(a) improved [26] 
and we chose hopping parameters [l7j ^strange = 0.134959 and K c harm = 0.124637. The 
quark fields are anti-periodic in time. 

The large statistics allows us to accurately measure the auto-correlation function. 
In Figure [8] we show the example of the meson mass with the longest auto-correlation 
among those considered in this study: the pseudo-scalar cc' meson with m c i = m c = 
?richarm- The normalized auto-correlation function quickly falls to p(6) « 0.2, but then 
exhibits a long tail for which we get non-zero values up to t ~ 200. As will be discussed 



further in Sect. 4.4 the contribution of the slow mode to T m t is C^Texp ~ 50%. 
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Table 3: Auto-correlation times for the quenched strange and charm quark observables 
along with pure gauge observables on the C4 lattice and Wilson loop on the C2 lattice. 



The window Wi has been obtained by setting the S parameter in eq. (2.10) equal to 1.5 
and 3. Larger values of S correspond to larger windows. 

Other results are shown in Table [3} The important observation on this table is 
that the auto-correlation times for all observables F that we looked at are r in t{F) < 20 
except for the squared topological charge, for which we find Ti n t(Qi) ~ r int(Q|) ~ 200. 
Thus there is very good evidence that the effect of the slow mode, which is clearly 
visible in the charge, is strongly supressed in other observables. Still this supression 
should be verified for each new observable and the effect of the slow mode should be 
estimated. The different numbers for the Ti n t-estimates in the table illustrate that 
significant contributions by slow modes are present. We now turn to this issue of good 
error estimates. 
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Figure 9: The coefficient C* F {r), eq. (2.31 ), for F the squared, once smeared Wilson loop 
of size 0.5fmx0.5fm. Four different lattice spacings are shown in the pure gauge theory, 
from top to bottom: Aid, B3d, C2d, Did. 



4.4 Improved error estimates 

Our results of the previous two subsections call for improved error estimates (Sect. [2T2] ) 
where the contribution of a long tail of the auto-correlation functions is included. We 
discuss numerical results from both the pure gauge theory runs and the quenched run. 



Also the decoupling of Sect. 2.3 is demonstrated for these cases. 



In the pure gauge theory data we clearly see decoupling of the Wilson loops. Re- 
call from Fig. [3] that the maximum auto-correlation is present for the once-smeared 
0.5fmx0.5fm Wilson loop. In Fig. [9] we thus show C^fV) introduced in Eq. (2.31) for 
that size. The dependence on r (not to be confused with the size of the loop) is rather 
insignificant, while the plot shows how the amplitude (r) decreases at smaller lattice 
spacings, independent of any small residual variation with r. 

The contribution of the slowest mode to T mt is given by the product r exp C\y 1 ■ In 
order to analyse the critical behavior of this quantity we fix r = 0.5 and plot r*, our 
estimator for r exp , the coefficient (7^(0.5) as well as the combination r* against 



the lattice spacing in Fig. 10 We see the strong critical slowing down as an increase of 
r* by orders of magnitude, which is, however, basically compensated by the decoupling 
characterized by the critical behavior of Cyy (0.5). As a result, the contribution of the 
slow mode to the auto-correlation time of the Wilson loop stays small and no severe 
critical slowing down is observed in this quantity. 

We now turn again to pseudo-scalar correlation functions on the C4d lattice. We 
saw earlier that the largest auto-correlation is seen for the correlator of two distinct 
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Figure 10: The slow mode contribution T exp Cp(r), eq. (|2.31[), for F the squared, once 



smeared Wilson loop of size 0.5fmx0.5fm as a function of the lattice spacing. 



flavor but mass-degenerate quarks with the mass of the charm quark. Therefore we 



illustrate the statements made in Sect. 2.2 for this example as well as for the squared 
topological charge. The estimate of r exp to be inserted into eq. (2.24) and eq 



was already discussed in Sect. 2.2 Here we show eq. (2.24) in comparison to eq. 



2.25 



2.23 



as a function of the window size W u , W\. They are plotted together in Fig. [TTJ We 
see that r-^ t (P^) represents a much safer estimate of r- mt than T; nt , also at a somewhat 
small value of W n , which one might be forced to choose if the statistics is small. In the 
case of the topological charge squared, the auto-correlation function follows closely a 
single exponential decay already at rather small times. Hence the determination of Tint 



including the tail from eq. (2.24) is precise at values of W u which are much smaller than 



the one that is chosen by our criterion in Sect. 2.2 



The safer error estimate described in Sect. 2.2 is convincing in the case of a large 



statistics - on the C4d lattice the total statistics is around 1000 x r eX p- For significantly 
smaller sample sizes the error estimate will of course be less reliable. We tested the 
stability by dividing the total run into pieces of about 2500 (MDU • R) each, which is 



about 10r exp . The histograms in Fig. 12 show the distribution of both standard and 



the improved error estimate following exactly Sect. 2.2 The observable is again the cd 
pseudo-scalar mass. These distributions teach the following lesson. The improved error 



estimate of eq. (2.24) and eq. (2.25) is always safely close to the true error or somewhat 



above it, while eq. ( |2.23 ) with the standard window size typically underestimates the 
error - not so rarely by a factor two. An error estimate using r^ t is recommended. The 
histograms also remind us of an obvious fact: typically the error of the statistical error 
is not that small in QCD simulations. 
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Figure 11: Top: The error estimates T^ nt {Ql) and r£ t (Q§) according to Eqs. ( |2.23[ ) 
and (2.24) as a function of the respective window Wu u . Their values according to our 
prescription are indicated by the filled symbol points. Bottom: the same for P5. 
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Figure 12: Histogram distribution of the error bars (upper and lower bound) for the 
mass of the pseudo-scalar 075 c'. The central dashed lines show the error of the error 
from the total statistics. 

4.5 Full QCD 

As part of the CL^] effort, we have carried out two rather long N{ = 2 QCD runs 
with about 16000 MDU each, and with R = 0.37. The ensemble E5f is generated with 
r = 1/2 and E5g has r = 4. Both simulations describe the same physics, using the 
non-perturbatively O(a) improved action 27 at j3 = 5.3, k = 0.13625 on 64 x 32 3 



lattices. The lattice spacing is read off from r^/a 7 28 and we are close to m n ro = 1, 
which means a pion mass of around 400 MeV. We will compare directly to the CI lattice 
whose lattice spacing is matched to this value of ro/o. But first we illustrate the quality 
of the runs by some time histories in Fig. [I3j the runs contain many sign-flips of the 
topological charge. As expected the frequency of topology changes is better for the 
lower, r = 4, run than for the upper, r = 1/2, case. 

A one-to-one comparison of the auto-correlation functions, quenched vs. Nf = 2, 



is presented in Fig. 14 One observes a very similar decorrelation of all observables 
quenched and in full QCD, except for the squared topological charge which decorrelates 
much faster with dynamical fermions. Unfortunately we cannot offer a real theoretical 
understanding of this rather striking observation. However, note that the change of the 
gauge action (in the pure gauge theory) from Wilson plaquette action to Iwasaki action 
has a similar effect, namely the auto-correlations of Q 2 were strongly affected while 
auto-correlations of other observables are essentially unchanged. Among the effects of 
the introduction of the fermion determinant is a change of the effective gauge action in 



"https : //twiki . cern. ch/twiki/bin/view/CLS/ 
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Figure 13: Histories of the charge Q§ in simulations E5f with r = 0.5 (top) and E5g 
with r = 4 (bottom). The Monte Carlo time t is given in molecular dynamics time 
units. 

the ultraviolet. Beyond the leading /3-shift there are also dimension six terms and this 
"part" of the fermion determinant is the same as a change of the lattice gauge action. 

As we did for the pure gauge theory, we now come to the extraction of the expo- 
nential auto-correlation time, see Fig. 15. The estimator r^ p (t,F) = r p F (t/2) \ ls 

_J ' 2 log | p f W / 

shown for those observables with the slowest decorrelation out of our set. Clearly the 

determination of is difficult, but it seems possible. The location of r*, which we 



remind the reader is our estimate eq. (2.30) for r ex p, is indicated in the figure. 



The numbers for Ti nt and r* are listed in Table |4j We see again that auto-correlation 
times for long trajectories with length r = 4 are around a factor two smaller than those 
for t = 1/2. In the table we list numbers for t* determined just from the indicated 
observable for illustration. In our estimate of T^ t the maximum one is then taken into 



account as defined in eq. (2.27). The more observables one considers, the better (larger) 



the estimate of r ex p will get. Even if this is still below the true value of r ex p, it will 
provide us with a more realistic estimate of r; n t. 



4-5.1 Proposal for error estimates 

The numbers in Table [4] come from a rather long simulation. Such data is not always 
available. Here we propose how one may proceed in this situation, using a reasonable 



estimate of the contribution of the tail of the auto-correlation function. Eq. ( 2.24 ) should 



be used when an onset of a tail is visible in the data and we suggest to choose W u such 
that 5[/9f(W u )] ~ pp(W n )/3. On the other hand for auto-correlations which quickly fall 
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Figure 14: Comparison of the normalized auto-correlation function p(t) between 
quenched and dynamical simulations at the same value of r^/a ~ 7 for different ob- 
servables. The data is from the Cld and E5g runs, respectively. Top: Comparison for 
topological charge squared and plaquette. Center: Pseudo-scalar meson masses with 
mass-degenerate quarks of the charm quark mass on the left (0750') and strange quark 
mass on the right (575s'), extracted from plateau averages over x$ 6 [23a, 27a]. Bottom: 
Auto-correlation functions of PC AC quark masses at xq = 24a. 
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Figure 15: Estimators for the exponential auto-correlation time from smeared plaquette 
and topological charge in pure gauge (Cld) and dynamical (E5g) simulations. 



off, eq. (2.25) is recommended. With low statistics an estimate of r eX p, needed for these 
formulae, is impossible to obtain. We therefore suggest to use a value for a not-so-small 
a with good statistics together with the scaling observed in the pure gauge theory. For 
our 0(a)-improved action at N{ = 2 we hence suggest an a -5 ~ exp(7/3) scaling (see 
eq.(4.5) of [29]). Together with r ex p ~ 40 at (3 = 5.3, this leads for the action of [27] t 



Rr exp « 200 exp(7(/3 - 5.5))MDU . (4.5) 

For safety reasons, one may attach an error of a factor 2 to this estimate and should 
of course be aware of the assumptions made above. The best situation is an observable 
with a strong decoupling, i.e. a small auto-correlation function pf(W u ) or pf{Wq), for 



which the intrinsic uncertainties of the model eq. (4.5) are not that relevant. 



5 Summary and conclusions 

In this paper we have established a very severe critical slowing down of the topological 
charge in pure Yang-Mills theory when using the (DD)-HMC algorithm. A dynamical 
critical exponent of z = 5 means that, at constant volume, the full simulation scales 
with o -10 at least, since an HMC type algorithm is expected to scale with a -4 from 
the increasing number of lattice points and typically an additional factor of a -1 from 
the decreasing step size. However, we also investigated Wilson loops, which are more 
commonly in the focus of interest. They are not affected in the same way, exhibiting 

6 To exclude any confusion, we here put the units explicitly which we have been using throughout. 
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Ql 


P 5 


TAG 


(Ql) 


4 

a Xt 


'int 


'int 




'int 


'int 


r* 


Cld 


50(4) 


2.4(2) xlO" 5 


137(25) 


134(15) 


140(18) 


38(4) 


43(5) 


74(11) 


E5f 


17.3(1.8) 


0.82(9) xl0~ 5 


16(4) 


23(5) 


21(5) 


84(31) 


66(13) 


66(19) 


E5g 


18.9(1.5) 


0.90(7) xl0~ 5 


10(2) 


14(3) 


15(4) 


29(8) 


29(5) 


39(12) 



Table 4: Comparison between the integrated auto-correlation times for the topological 
charge and the smeared plaquette between the dynamical and the corresponding pure 
gauge ensemble. The dynamical runs E5f and E5g have trajectory length r = 0.5 and 
r = 4, respectively. The pure gauge run Cld has r = 4. Also the value of the topological 
susceptibility, Xt, is given. 



a much milder slowing down while approaching the continuum limit (z ~ 0.5. ..1). 
Martin Liischer investigated observables after integrating the Wilson flow [130 j for some 
distance which removes UV fluctuations. These observables effectively show z ~ 2 [3l] , 
a critical slowing down in between Q 2 and the Wilson loops. 

We have also considered observables formed from pseudoscalar correlation func- 
tions, both in a quenched setting and for Nf = 2. These quantities are of immediate 
interest and at the same time not plagued by large UV fluctuations. At a lattice spacing 
of a ~ 0.07fm their autocorrelation functions are much better behaved than the one of 
Q\ but a weak coupling to the slow mode is seen in Fig. 14. Unfortunately, a systematic 



study as a function of the lattice spacing and quark mass is prohibitively expensive, but 
we expect that these observables continue to couple only weakly to the slow mode and 
their slowing down is significantly less severe than for the topological charge. 

On the one hand, this is encouraging. In practical simulations, unless we are 
interested in the slow observables themselves, we do not need to gather enough statistics 
to accurately determine their auto-correlation time. It is sufficient to have a decent 
sampling in the slow modes to assure practical ergodicity, i.e. a few times their auto- 
correlation times is needed. On the other hand, the danger remains that there are 
even slower modes which are so slow that the corresponding fluctuations do not show 
in the full runtime. The only way to study this is to start simulations in parameter 
space, which can be considered safe and then move in small steps towards the critical 
points, monitoring a large number of observables and relying on the continuity of auto- 
correlations in terms of the system's parameters. 

Even if the coupling to the slow modes may be small it is important for a correct 
error analysis. We described a practical method to take these effects into account. 
It relies on the fact that information about r exp can be obtained through observables 
which couple strongly to the corresponding mode. Under these circumstances, the error 
analysis can be made significantly safer. 

Still, a true solution to the critical slowing down has to be an algorithmic one which 
at least solves the problem regarding the topological charge. The dramatic progress in 
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the fermion algorithms which the field has witnessed during the last decade gives us 
hope that this can actually be achieved. 
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A Error of the Error 

An introduction to the error analysis of correlated data from a Markov chain with 
references can be found in [8] while the case of functions of the primary observables is 



treated in 10 . Here we review the main formulae for estimating the error of the error 



from these references and make those explicit which are not given there. 



Our estimator for the mean, eq. (2.2), and the auto-correlation function, eq. (2.4), 

are 



O, 



1 - 

-J20 a (q t ), F = F(O a ) (A.l) 



t=i 

N-\t\ N 



FF 



(t) = -^TTi E dF{q u )dF'{ qu+t ) + ^Y.dF{q u )dF'{ qu ), (A.2) 



JV — 1*1 ^ N 2 

u=l u=l 



where 

dF{q) = ^F a (O tt ( g )-O a ) (A.3) 



which for Tff' contains a bias correction discussed in 10 . In the computation of our 



auto-correlation times, e.g. eq. (2.24), we replace T by its estimator T. 

In Sect.|] we introduced various quantities which are functions G({r^(i)}). Their 
error is computed from simple error propagation 



A,B 

where A = (F, u) and B = (F' , v) collect the observable and time variable and run over 
all components of the auto-correlation functions. The covariance matrix T,ab is given 
by 

N N 
s=l t=l 
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Neglecting the completely connected part of the fourth moment in eq. (A.5), we have 
the approximation 

- 1 °° 

Sab ~ -tj T FF '(m + v)T FF /(m + u) +T FF '(m + v)T FF i(m - u) , (A. 6) 



m=— 00 



which in practice we evaluate with the cut eq. (A.IO) to be discussed below. 

In some simple cases, approximations can be applied to eq. (A.4) to derive more 
compact error formulae, for example [9,13 



(A.7) 
(A.8) 

(A.9) 



1 x 

(tp(t)f ~ ^ E ^ m + *) + p( m - *) - 2 pMp(*)) s 



m=l 



Furthermore we used the approximation 



(^nt) 2 ~ (ST int (W u )) 2 + T*(6p(W u )Y + p 2 (W u )(6r exp Y 



for the error of eq. ( |2.24 ) after checking that the left out cross terms are negligible. 
Also in the case of eq. (2.29) a similar approximation has been applied. For all other 
quantities we remained directly with eq. (A.4). 



As mentioned, the evaluation of the sum in eq. (A. 6) and also the one in eq. (A.8) 
require the introduction of a cutoff on the sum over m. This can be done with the 
function 

ff(x) \x\ < We 
I Id > W E . 



T(x) 



(A.IO) 



(and p) where we omit the subscripts to keep notation light. The computation of E(it, v) 
is then carried out as the sum of terms Y^ m r(m)r(m+t) at times t = u+v and t = u—v: 



£(it, v) ~ — (m)f (m + u + v) + f (m)f (m + 



u — v) 



(A.ll) 



that can be done in at most 0(W|) operations. The size of We can be discussed in 
similar manner as we have done about window sizes for the auto-correlation function. 
In our analysis we take Ws = W, with W given by prescription eq. (2.11), however, 
using We = 2W gives similar results. 

In principle one could think about adding a tail contribution to eq. (A.IO), but this 
goes too far, even with the statistics we had at hand. 
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